Hybridized Dimensionality Reduction Method for Machine Learning based Web Pages Classification
نویسندگان
چکیده
Feature space high dimensionality is a well-known problem in text classification and web mining domains, it caused mainly by the large number of vocabularies contained within documents. Several methods were applied to select most useful important features over years; however, performance such still improvable from different aspects as computational cost accuracy. This research presents an enhanced cosine similarity-based hybridization two efficient feature selection for higher performance. The reduced sets are generated using Random Projection (RP) Principal Component Analysis (PCA) methods, individually, then hybridized based on similarity values between features’ vectors. proposed method terms accuracy F-measure was tested dataset pages several term weighting schemes. As compared relevant results show significantly f-measure less set size. Index Terms— Cosine similarity, Dimensionality Reduction, selection, PCA, Projection.
منابع مشابه
Dimensionality Reduction for Colour Based Pixel Classification
In digital images, providing classification based on colour, hue or spectral angle is a problem usually solved by combining a variety of pre-processing steps, as well as object wise classifiers. We have developed a method for transforming colour or multispectral image data to a 1D colour histogram with respect to the digital characteristics of intensity measurements. Classification is then redu...
متن کاملGeoreferencing Semi-Structured Place-Based Web Resources Using Machine Learning
In recent years, the shared content on the web has had significant growth. A great part of these information are publicly available in the form of semi-strunctured data. Moreover, a significant amount of these information are related to place. Such types of information refer to a location on the earth, however, they do not contain any explicit coordinates. In this research, we tried to georefer...
متن کاملApproach for Dimensionality Reduction in Web Page Classification
Dimensionality refers to number of terms in a web page. While classifying web pages high dimensionality of web pages causes problem. The main objective of reducing dimensionality of web pages is improving the performance of classifier. Processing time and accuracy are two parameters which influence the performance of a classifier. To reduce the processing time, less informative and redundant te...
متن کاملBuilding an asynchronous web-based tool for machine learning classification
Various unsupervised and supervised learning methods including support vector machines, classification trees, linear discriminant analysis and nearest neighbor classifiers have been used to classify high-throughput gene expression data. Simpler and more widely accepted statistical tools have not yet been used for this purpose, hence proper comparisons between classification methods have not bee...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ?????? ???????? ?????? ???????? ?????????? ???????? ??????
سال: 2022
ISSN: ['2617-3352', '1811-9212']
DOI: https://doi.org/10.33103/uot.ijccce.22.3.9